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Abstract 

A  short  account  is  given  of  the  BCM  theory  of  synaptic  plasticity:  assumptions,  conse¬ 
quences,  comparison  with  experiment  and  statistical  properties.  In  addition  a  framework  for 
comparison  with  other  theoretical  ideas  is  presented. 


1  Introduction 

Because  of  its  great  complexity,  visual  cortex  would  not  seem  to  be  an  auspicious  region  of  the 
braun  to  carry  out  an  investigation  of  synaptic  plasticity  or  of  the  mechanisms  and  sites  of  memory 
storage.  It  is,  in  addition,  almost  certain  that  much  of  the  architecture  of  visual  cortex  is  prepro¬ 
grammed  genetically,  leaving  a  relatively  minor  percentage  to  be  shaped  or  modified  by  experience. 
However  the  fact  that  visual  cortex  is  accessible  to  single-cell  electrophysiology,  so  that  the  output 
of  individual  cells  can  be  measured,  whereas  the  inputs  can  be  controlled  by  varying  the  visual  ex¬ 
perience  of  the  animal  has  made  this  a  preferred  area  for  experimentation  and  analysis.  Thus  over 
the  past  30  years,  a  great  deal  of  experimental  and  theoretical  work  has  been  done,  to  investigate 
the  responses  of  visual  cortical  cells,  as  well  as  the  alterations  in  these  responses  under  various 
visual  rearing  conditions. 

It  is  widely  believed  that  much  of  the  learning  and  resulting  organization  of  visual  cortex  as  well 
as  other  parts  of  the  central  nervous  system  occurs  due  to  modification  of  the  efficacy  or  strength 
of  at  least  some  of  the  synaptic  junctions  between  neurons,  thus  altering  the  relation  between 
presynaptic  and  postsynaptic  potentials.  The  vast  amount  of  experimental  work  done  in  visual 
cortex  -  particularly  area  17  of  cat  and  monkey  -  strongly  indicates  that  one  is  observing  a  process 
of  synaptic  modification  dependent  on  the  information  locally  and  globally  available  to  the  cortical 
cells.  Furthermore,  it  is  known  that  small  but  coherent  modifications  of  large  numbers  of  synaptic 
junctions  can  result  in  distributed  memories.  Whether  and  how  such  synaptic  modification  occurs, 
what  precise  forms  it  takes,  and  what  the  physiological  and/or  anatomical  bases  of  this  modification 
are,  among  the  most  interesting  questions  in  this  area.  There  is  no  need  to  assume  that  such 
mechanisms  operate  in  exactly  the  same  manner  in  all  portions  of  the  nervous  system  or  in  all 
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Foundation. 
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animals.  However,  one  would  hope  that  certain  fundamental  similarities  exist  so  that  a  detailed 
analysis  of  the  properties  of  these  mechanisms  in  one  preparation  would  lead  to  some  conclusions 
that  are  generally  applicable. 

It  is  our  hope  that  such  a  general  form  of  modifiability  manifests  itself  for  at  least  some  cells  of 
visual  cortex  that  are  accessible  to  experiment.  If  so,  one  then  may  be  able  to  distinguish  between 
different  cortical  plasticity  theories  with  theoretical  tools  and  the  aid  of  sophisticated  experimental 
paradigms.  Among  the  difficulties  faced  by  theoreticians  are  (1)  adequate  representation  of  the 
visual  environment  ;  (2)  knowledge  of  what  the  actual  inputs  to  cortical  cells  are;  (3)  the  appropriate 
rule  for  synaptic  modification;  and  (4)  an  adequate  representation  of  the  complex  architecture  of 
visual  cortex. 

In  this  article,  we  give  a  brief  overview  of  the  BCM  theory  of  visual  cortical  plasticity  that 
has  been  developed  over  the  past  ten  years,  address  the  difficulties  mentioned  above  and  compare 
the  consequences  of  the  theory  with  experiment.  We  discuss  recent  physiological  experiments  that 
seem  to  provide  verification  of  some  of  the  underlying  assumptions  of  the  theory,  and  finally,  we 
initiate  a  comparison  of  the  BCM  theory  with  other  theories  that  have  been  proposed.  We  assume 
that  the  reader  has  some  familiarity  with  experiments  demonstrating  plasticity  in  visual  cortex.  A 
brief  review  may  be  found  in  Clothiaux  et  al.,  1991. 

2  BCM  Theory 

In  what  follows  we  give  a  brief  overview  of  the  BCM  theory  of  synaptic  plasticity.  For  a  more 
detailed  account  the  reader  is  referred  to  the  various  references  cited  below. 

2.1  Single  Cell 

A  typical  neuron  in  striate  cortex  receives  thousands  of  afferents  from  other  cells.  Most  of  these 
afferents  derive  from  the  lateral  geniculate  nucleus  (LGN)  and  from  other  cortical  neurons.  We  have 
approached  the  analysis  of  this  complex  network  in  several  stages.  In  the  first  stage  we  consider  a 
single  neuron  with  inputs  from  both  eyes  (i.e.,  LGN)  but  without  intracortical  interactions. 

The  output  of  this  neuron  (in  the  linear  region)  can  be  written 

c  =  mf  ■ 

where  df  {(T)  are  the  LGN  inputs  coming  from  the  left  (right)  eye  to  the  vector  of  synaptic  junctions 
m}  (m’’).  The  neuron  firing  rate  (in  the  linear  region)  is  therefore  the  sum  of  the  inputs  from  the 
left  eye  multiplied  by  the  appropriate  left-eye  synaptic  weights  plus  the  inputs  from  the  right 
eye  multiplied  by  the  appropriate  right-eye  synaptic  weights.  Thus  the  neuron  integrates  signals 
from  the  left  and  right  eyes.  (For  simplicity,  whenever  possible  we  shall  omit  the  left  and  right 
superscripts.)  According  to  the  theory  presented  by  Bienenstock,  Cooper  and  Munro  (BCM,  1982), 
the  synaptic  weight  changes  over  time  as  a  function  of  local  and  global  variables:  its  change  in  time, 
Thj,  is  given  below: 

rhj  =  Fidj, . . . ,  m,;  c£fc, . . . ,  c;  c;  X,  Y,  Z). 

Here  variables  such  as  dj,...,mj  are  designated  local.  These  represent  information  (such  as  the 
incoming  signal,  dj,  and  the  strength  of  the  synaptic  junction,  rrij)  available  locally  at  the  synaptic 
junction,  mj.  Variables  such  as  dk,...,c  are  designated  quasi-local.  These  represent  information 
(such  as  c,  the  firing  rate  of  the  postsynaptic  cell,  or  dk,  the  incoming  signal  to  another  synaptic 
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junction)  that  is  not  locally  available  to  the  junction  rrij  but  is  physically  connected  to  the  junction 
by  the  cell  body  itself-thus  necessitating  some  form  of  internal  communication  between  various 
parts  of  the  cell  and  its  synaptic  junctions.  Variables  such  as  c  (the  time  averaged  output  of  the 
cell)  are  averaged  local  or  quasi-local  variables.  Global  variables  are  designated  X,  Y,Z,.. ..  These 
latter  represent  information  (e.g.  presence  or  absence  of  neurotransmitters  such  as  norepinephrine 
or  the  average  activity  of  large  numbers  of  cortical  cells)  that  is  present  in  a  similar  fashion  for  all 
or  a  large  number  of  cortical  neurons  (distinguished  from  local  or  quasi-local  variables  presumably 
carrying  detailed  information  that  varies  from  synapse  to  synapse  or  cell  to  cell).  Neglecting  global 
variables,  one  arrives  at  the  following  form  of  synaptic  modification  equation: 

ihj  =  (t>{c,Q„,)dj  (2.1) 

so  that  the  synaptic  junction,  nij,  changes  its  value  in  time  as  the  product  of  the  input  activity 
(the  local  variable  dj)  and  a  function  (f)  of  quasi-local  and  time-averaged  quasi-local  variables,  c  and 
©m-  ©m  is  a  nonlinear  function  of  some  time  averaged  measure  of  cell  activity  that  in  the  original 
BCM  formulation  was  proposed  as 


©m  =  (c)^  (2.2) 

In  BCM,  this  time  average  is  replaced,  for  simplicity,  by  a  spatial  average  over  the  environmental 
inputs  (c  — *  m  ■  d).  The  shape  of  the  function  (p  is  given  in  Figure  2  for  two  different  values  of  the 
threshold  0^-  The  occurrence  of  negative  and  positive  regions  for  <f>  results  in  the  cell  becoming 
selectively  responsive  to  subsets  of  stimuli  in  the  visual  environment.  This  happens  because  the 
response  of  the  cell  is  diminished  to  those  patterns  for  which  the  output,  c,  is  below  threshold 
{<l>  negative)  while  the  response  is  enhanced  to  those  patterns  for  which  the  output,  c,  is  above 
threshold  {<p  positive).  The  non-linear  variation  of  the  threshold  0m  with  the  average  output  of 
the  cell  contributes  to  the  development  of  selectivity  and  the  stability  of  the  system  (Bienenstock 
et  al.,  1982;  Intrator  and  Cooper,  1992). 

2.2  Cortical  Network:  Mean  Field  Theory 

The  actual  cortical  network  is  very  complex.  It  includes  different  cell  types,  intracortical  interac¬ 
tions,  and  recurrent  collaterals.  In  what  follows  we  present  a  method  of  analyzing  this  complex 
system.  The  first  step  is  to  divide  the  inputs  to  any  cell  into  those  from  the  LGN  and  those  from 
all  other  sources.  The  activity  of  neuron  i  is  affected  by  its  input  vector  d  from  the  LGN,  and  by 
the  adjacent  cortical  neurons; 


Ci  =  mi-d  +  Y^  LijCj,  (2.3) 

j 

where  Lij  are  the  cortico-cortical  synapses.  Scofield  and  Cooper  (1985,  1988)  have  analyzed  a 
network  extension  of  the  single  cell  theory  and  a  mean  field  approximation  to  the  full  network. 
Defining  Ci,  where  N  is  the  number  of  neurons  in  the  network,  the  mean  field  approxi¬ 

mation  is  obtained  by  replacing  the  inhibitory  contribution  of  cell  j  ,  cy  by  its  average  value  so  that 
Ci  becomes: 


a  = 


mid  +  cY^Lij. 


(2.4) 
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From  a  consistency  condition  it  follows  that  c  =  m  d  +  cio  =  (1  —  Lq)  ^m  d,  where  m  —  ^  ^  -  mi, 
and  Lo  =  I'ij,  so  that  Ci  -  (m,  +  (1  -  Lij)d. 

If  we  assume  that  the  lateral  connection  strengths  are  a  function  only  of  the  relative  distance 
i  —  3,  then  Lij  becomes  a  circular  matrix  so  that  Lij  =  Lij  =  Lq,  and 

Ct  =  +  io(l  -  Lo)~^fh)d.  (2.5) 

In  the  mean  field  approximation,  one  can  therefore  write  Ci(a)  =  (m^  —  Q)<i,  with  a  =  |//ol(l  + 

When  analyzing  the  position  and  stability  of  the  fixed  points  using  this  approximation,  it 
follows  under  some  mild  assumption  on  the  evolution  of  the  average  synaptic  weights,  that  there  is 
a  mapping 

m'i  Tni{a)  —  a 

such  that  for  every  neuron  in  such  a  network  with  synaptic  weight  vector  there  is  a  corresponding 
neuron  with  weight  vector  that  undergoes  the  same  evolution  (around  the  fixed  points)  subject 
to  a  translation  a. 

Although  the  averaged  inhibition  assumption  used  in  the  mean-field  theory  is  an  approximation, 
the  mean  field  network  described  above  provides  a  powerful  tool  to  analyze  a  certain  type  of  network 
architecture  in  great  detail,  and  to  gain  an  intuitive  understanding  of  a  complex  network  in  terms 
of  the  behavior  of  a  single  neuron. 

2.3  Synapses  with  Varying  Modifiability 

In  the  equations  above,  all  synapses  are  taken  to  be  modifiable  in  the  same  way.  However,  the 
behavior  of  visual  cortical  cells  in  various  rearing  conditions  suggests  that  some  cells  respond  more 
rapidly  to  environmental  changes  than  others.  In  monocular  deprivation,  for  example,  some  cells 
remain  responsive  to  the  closed  eye  in  spite  of  the  very  large  shift  of  most  cells  to  the  open  eye. 
Hubei  and  Wiesel  (1959)  and  Singer  (1977)  ,  found,  using  intracellular  recording,  that  geniculo- 
cortical  synapses  on  inhibitory  interneurons  are  more  resistant  to  monocular  deprivation  than  are 
synapses  on  pyramidal  cell  dendrites.  These  results  suggest  that  some  LGN-cortical  synapses  mod¬ 
ify  rapidly,  while  others  modify  relatively  slowly,  with  slow  modification  of  some  cortico-cortical 
synapses.  Excitatory  LGN-cortical  synapses  onto  excitatory  cells  may  be  those  that  modify  pri¬ 
marily.  Since  these  synapses  are  formed  exclusively  on  dendritic  spines,  this  raises  the  possibility 
that  the  mechanisms  underlying  synaptic  modification  exist  primarily  in  axo-spinous  synapses.  To 
embody  these  facts  we  introduce  two  types  of  LGN-cortical  synapses:  those  (m^)  that  modify  ac¬ 
cording  to  the  modification  rule  discussed  in  BCM  and  those  (z*)  that  remain  relatively  constant.  In 
a  cortical  network  with  modifiable  and  non-  modifiable  LGN-cortical  synapses,  and  non-modifiable 
cortico-cortical  synapses  Lij,  the  synaptic  evolution  equations  become 


ihi 

=  <^(c.,0„‘)d. 

h 

=  0, 

=  0. 

(2.6) 

As  will  be  discussed  below,  such  a  network  is  capable  of  explaining  the  variety  of  experiments 
considered. 
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3  BCM  and  the  Neurobiology  of  Synaptic  Modification 

The  BCM  theory  and  its  recent  extensions  originated  as  an  attempt  to  account  for  the  varied 
consequences  of  different  visual  environments  on  the  developing  visual  cortex.  In  cats,  the  circuitry 
of  the  visual  cortex  can  be  modified  by  simple  manipulations  of  visual  experience  during  a  “critical 
period”  in  the  first  few  months  of  postnatal  development.  For  example,  one  such  manipulation, 
monocular  deprivation,  leads  to  a  disconnection  of  the  inputs  from  the  deprived  eye  which  renders 
the  animal  behavioraUy  blind  through  that  eye.  The  goal  of  the  BCM  theory  is  to  develop  a  model 
of  synaptic  modification  which  accounts  for  those  striking  changes  in  visual  cortex  that  result  from 
alterations  in  the  patterns  and  amount  of  activity  arising  at  the  two  retinae.  While  the  theory 
aims  to  provide  a  physiologicaUy-plausible  account  of  synaptic  plasticity,  it  does  not  address  the 
mechanism  by  which  plasticity  diminishes  at  the  end  of  the  critical  period.  A  number  of  possible 
mechanisms  have  been  proposed  to  account  for  the  short  duration  of  the  plastic  period  but  at 
present  it  is  not  clear  that  the  length  of  the  critical  period  is  determined  by  the  same  mechanism 
as  that  underlying  synaptic  change. 

The  validity  of  the  BCM  theory,  as  with  any  theory,  can  be  tested  in  two  ways.  The  first  is 
to  derive  predictions  or  consequences  of  the  theory  in  various  situations  that  can  be  compared 
with  experimental  results.  There  is  a  considerable  experimental  literature  on  visual  cortical  plas¬ 
ticity  reaching  back  30  years  which  facilitates  such  comparisons  with  the  BCM  theory.  The  second 
approach  is  to  attempt  to  verify  the  underlying  assumptions  of  the  theory,  particularly  those  as¬ 
sumptions  that  distinguish  it  from  others.  In  the  case  of  BCM  the  most  important  and  unique 
assumptions  concern  the  form  of  the  synaptic  modification  function  4>  and  the  movement  of  the 
modification  threshold.  Over  the  last  five  years  we  have  made  significant  progress  using  both  of 
these  approaches,  and  this  work  is  summarized  briefly  below. 

3.1  Comparison  of  Theory  and  Experiment 

In  work  recently  published  by  Eugene  Clothiaux  and  colleagues  (Clothiaux  et  al.,  1991)  the  conse¬ 
quences  of  the  BCM  theory  were  compared  in  detail  with  the  results  of  experiments  on  what  were 
called  “classical”  rearing  conditions.  These  conditions  include  normal  binocular  vision,  monocular 
deprivation,  reverse  suture,  strabismus,  binocular  deprivation,  as  well  as  the  restoration  of  normal 
binocular  vision  after  various  forms  of  deprivation.  Comparisons  with  the  pharmacological  manipu¬ 
lations  that  affect  visual  cortical  plasticity  (e.g.  Greuel  et  al.,  1987;  Reiter  and  Stryker,  1988;  Bear 
et  al.,  1990)  were  not  considered  and  remain  an  area  that  is  ripe  for  further  work.  The  modifica¬ 
tions  considered  by  Clothiaux  et  al.  were  those  that  occur  in  kitten  visual  cortex  during  the  second 
postnatal  month  after  brief  {^2  weeks)  changes  in  visual  experience.  Particular  attention  was 
given  to  the  manner  in  which  the  theory  predicts  that  changes  in  visual  experience  should  affect 
the  binocularity  of  cortical  neurons  and  the  selectivity  of  these  neurons  for  the  stimulus  pattern 
(eg.  its  orientation).  It  is  these  properties  of  binocularity  and  selectivity  which  distinguish  cortical 
neurons  from  those  in  the  retina  and  thalamus.  A  review  of  the  experimental  literature  as  it  relates 
to  the  modification  of  these  properties  may  be  found  in  Clothiaux  et  al.  (1991). 

All  theories  of  visual  cortical  plasticity  have  to  make  some  assumption  as  to  how  the  initial 
visual  scenes  are  converted  into  LGN  firing  rates  and  how  this  information  reaches  visual  cortex. 
We  wish  to  model  the  input  to  visual  cortex  that  arises  from  the  regions  of  the  two  retinae  that 
view  the  same  point  in  visual  space.  For  simplicity,  Clothiaux  et  al.  assumed  that  LGN  activity  is  a 
direct  reflection  of  retinal  ganglion  cell  activity.  Two  types  of  LGN-cortical  input  were  modeled;  (1) 
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activity  elicited  when  visual  contours  are  presented  to  the  retinae,  which  we  Ccdl  “pattern”  input; 
and  (2)  activity  that  arises  in  the  absence  of  visual  contours,  which  we  call  “noise”.  From  our 
point  of  view  the  important  distinction  between  pattern  and  noise  input  is  the  degree  of  correlation 
that  the  two  types  of  input  produce  in  the  LGN.  For  a  specific  input  “pattern”  the  activity  of 
one  LGN  neuron  is  assumed  to  have  a  predictable  relationship  (i.e.  correlation)  to  the  activity  of 
other  LGN  neurons,  while  for  “noise”  the  activity  of  one  LGN  neuron  is  independent  of  the  activity 
of  the  other  LGN  neurons^.  Differences  between  distinct  patterns  (for  example,  between  various 
stimulus  orientations)  are  reflected  by  the  differences  in  their  distribution  of  activity  across  the 
LGN.  Using  this  type  of  pattern  input  distorted  by  noise,  and  noise  alone,  Clothiaux  et  al.  were 
able  to  reproduce  both  the  outcome  and  kinetics  of  synaptic  change  in  visual  cortex  resulting  from 
normal  visual  experience  and  a  wide  variety  of  visual  deprivation  conditions. 

As  one  example  of  the  quantitative  nature  of  these  results,  consider  the  simulation  of  the  effects 
of  monocular  deprivation  (Figure  7,  Clothiaux  et  al.,  1991).  Beginning  from  a  state  in  which 
the  simulated  neuron  is  binocularly  responsive  and  selective,  substituting  pattern  input  through 
one  eye  with  noise  leads  to  a  rapid  synaptic  disconnection  of  the  “deprived”  eye.  Mathematical 
analysis  provides  a  complete  account  of  the  factors  on  which  this  result  depends  if  it  is  governed 
by  the  principles  of  the  BCM  theory.  For  example,  for  this  result  to  be  obtained  using  BCM  it 
is  necessary  that  the  neuron  be  selective  (i.e.  that  it  responds  vigorously  only  to  a  fraction  of 
the  patterns  that  are  presented  to  the  “open  eye”)  before  the  ocular  dominance  changes,  and  that 
the  deprived  eye  inputs  carry  noise  (i.e.  that  they  be  active).  The  prediction  that  the  ocular 
dominance  shift  depends  on  neuronal  selectivity  was  tested  by  Paradiso  and  colleagues  (Ramoa  et 
al.,  1988).  They  found  that  cortical  infusion  of  the  GABA  receptor  antagonist  bicuculline,  which 
greatly  reduces  orientation  selectivity  in  visual  cortex,  eliminates  the  ocular  dominance  shift  that 
normally  results  from  monocular  deprivation.  The  second  prediction  that  the  disconnection  of  the 
deprived  eye  depends  on  noise  has  never  been  tested  explicitly,  but  there  are  some  indications  that 
it  is  also  correct.  For  example,  clinical  observations  in  humans  led  Jampolsky  (1978)  to  conclude 
that  the  effects  of  monocular  diffusion  (resulting  from  lid  suture)  are  more  severe  than  the  effects 
of  monocular  occlusion  (resulting  from  an  opaque  eye-patch  or  contact  lens). 

To  determine  the  time  equivalence  of  each  iteration  for  the  parameters  used,  the  behavior 
of  the  model  under  monocular  deprivation  can  be  compared  to  the  results  of  the  corresponding 
experiment.  Equivalence  was  established  between  the  number  of  computer  iterations  and  the 
duration  of  deprivation  required  for  complete  disconnection  of  the  deprived  eye  (Clothiaux  et  al. 
1991).  Thus,  using  a  fixed  set  of  parameters,  one  has  a  direct  correspondence  between  the  temporal 
dynamics  of  synaptic  change  in  the  theory  and  experiments.  This  can  be  used  to  analyze  and 
compare  kinetics  and  outcome  of  theory  and  experiment  for  other  manipulations.  For  example, 
in  “reverse  suture”,  the  deprived  eye  is  opened  and  the  open  eye  is  closed  after  a  period  of  initial 
monocular  deprivation.  Experimentally,  it  is  observed  that  the  newly  closed  eye  shows  a  greatly 
reduced  response  in  about  24  hours,  but  that  the  recovery  of  the  response  to  the  newly  open  eye 
generafly  does  not  begin  for  another  1-2  days  (Mioche  and  Singer,  1989).  The  same  difference 
in  the  time  required  to  obtain  the  initial  effect  and  the  reversal  is  seen  with  the  model.  The 
correspondence  of  theory  and  experiment  is  thus  very  close.  The  theoretical  explanation  for  this 
result  is  that  recovery  requires  that  the  modification  threshold  slide  nearly  to  zero  and,  using  the 
same  parameters  that  were  fixed  for  monocular  deprivation,  this  requires  approximately  24  hours. 

‘'Addition  of  local  correlations  such  as  those  suggested  by  the  work  of  Mastronarde  (1989)  does  not  alter  the 
results. 


6 


Similar  comparisons  for  the  other  experimental  manipulations  are  discussed  in  detail  in  Clothiaux 
et  al.  (1991).  We  conclude  that  when  the  predictions  of  the  theory  have  been  tested  they  are  in 
good  agreement  with  what  is  seen  experimentally. 

3.2  Neurobiological  Foundations  for  the  Assumptions  of  the  BCM  Theory 

Recent  advances  in  our  understanding  of  excitatory  amino  acid  (EAA)  receptors  have  suggested  a 
possible  physiological  basis  of  the  BCM  form  of  synaptic  modification.  In  1987,  Bear  et  al.  pro¬ 
posed  that  the  modification  threshold  Qm  of  BCM  related  to  the  membrane  potential  at  which  the 
N-methyl-D-aspartate  ^NMDA)  receptor  dependent  Ca^"*"  flux  reached  the  threshold  for  inducing 
synaptic  long-term  potentiation  (LTP).  In  support  of  the  hypothesis  that  NMDA  receptor  mecha¬ 
nisms  play  a  role  in  synaptic  plasticity,  Bear  and  co-workers  have  found  that  the  pharmacological 
blockade  of  NMDA  receptors  with  the  competitive  antagonist  APS  disrupts  the  physiological  (Klein- 
schmidt  et  al.,  1987;  Bear  et  al.,  1990)  and  anatomical  (Bear  and  Colman,  1990)  consequences  of 
monocular  deprivation  in  striate  cortex.  Although  the  interpretation  of  these  experiments  is  com¬ 
promised  by  the  finding  that  APS  reduces  visually  evoked  responses  (Fox  et  al.,  1989),  the  data 
indicate  that  activity  evoked  in  visual  cortex  in  the  absence  of  NMDA  receptor  activation  is  not 
sufficient  to  produce  loss  of  closed-eye  responsiveness  in  MD. 

In  the  past  several  years  our  work  has  been  focused  on  the  synaptic  plasticity  that  can  be 
evoked  in  brain  slices  to  better  investigate  the  assumptions  of  the  BCM  theory  and  to  address 
possible  underlying  mechanisms  (Connors  and  Bear,  1988;  Press  and  Bear,  1990;  Bear  et  al.,  1992; 
Dudek  and  Bear,  1992;  Kirkwood  et  al.,  1992).  Hippocampus,  particularly  CAl  and  dentate  gyrus, 
is  an  advantageous  preparation  because  robust  and  long-lasting  experience-dependent  synaptic 
modifications  can  be  evoked  in  this  structure.  Serena  Dudek  in  Bear’s  lab  (1992)  recently  tested 
a  theoretical  prediction  that  patterns  of  excitatory  input  activity  that  consistently  fail  to  activate 
target  neurons  sufficiently  to  induce  synaptic  potentiation  will  instead  cause  a  specific  synaptic 
depression.  To  realize  this  situation  experimentally,  the  Schaffer  collateral  projection  to  CAl  in  rat 
hippocampal  slices  was  stimulated  electrically  at  frequencies  ranging  from  0.5  to  50  Hz.  900  pulses 
at  1-3  Hz  consistently  yielded  a  depression  of  the  CAl  population  EPSP  that  persisted  without 
signs  of  recovery  for  >  1  hour  following  cessation  of  the  conditioning  stimulation.  This  long-term 
depression  was  specific  to  the  conditioned  input  and  could  be  prevented  by  application  of  NMDA 
receptor  ajitagonists.  This  result  was  surprising  in  that  NMDA  receptors  are  known  to  participate 
in  the  induction  of  long-term  potentiation,  an  increase  in  synaptic  effectiveness.  Indeed,  at  higher 
stimulation  frequencies  the  depression  was  replaced  by  a  potentiation.  If  the  effects  of  varying 
stimulation  frequency  in  the  experiments  of  Dudek  and  Bear  are  explained  by  different  values  of 
postsynaptic  response  (perhaps  the  integrated  postsynaptic  depolarization  or  Ca*"*"  level)  during 
the  conditioning  stimulation,  then  it  can  be  seen  from  Figure  3  that  their  data  are  in  striking 
agreement  with  assumptions  of  the  BCM  theory. 

Of  course,  as  striking  as  this  similarity  is,  Dudek’s  work  was  performed  in  hippocampus  and 
the  BCM  theory  was  developed  for  visual  cortex.  And,  although  these  two  forms  of  synaptic 
plasticity  (depression  and  potentiation)  have  been  reported  in  the  sensory  neocortex  (cf.  Artola 
et  al.,  1990),  evidence  to  date  has  indicated  that  they  occur  with  far  lower  probability,  usually 
require  pharmacological  treatments  for  their  induction,  and  are  elicited  by  stimulation  patterns 
that  differ  dramatically  from  those  that  are  effective  in  hippocampus  (see  discussion  in  Bear  et  al., 
1992).  Together,  these  data  have  been  taken  as  support  for  the  view  that  hippocampus  and  sensory 
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neocortex  may  be  quite  distinct  with  respect  to  their  capability  for  synaptic  change.  However,  a 
direct  comparison  of  plasticity  of  synaptic  responses  evoked  in  adult  rat  hippocampal  field  CAl 
with  those  evoked  in  adult  rat  and  immature  cat  visual  cortical  layer  III  has  now  been  carried  out 
by  Alfredo  Kirkwood  and  colleagues  in  Bear’s  lab  (1992).  In  the  neocortical  preparations  they  have 
stimulated  the  direct  input  to  layer  III  from  layer  IV  rather  than  using  the  traditional  approach 
of  stimulating  the  white  matter,  and  find,  contrary  to  the  prevailing  view,  that  very  similar  forms 
of  plasticity,  LTP  and  LTD,  are  evoked  with  precisely  the  same  types  of  stimulation  in  the  three 
types  of  cortex  without  the  use  of  pharmacological  treatments.  Further,  in  all  three  preparations, 
both  LTP  and  LTD  depend  on  activation  of  NMDA  receptors.  These  data  suggest,  first,  that 
hippocampus  should  not  be  considered  as  a  privileged  site  for  plasticity  in  the  adult  brain  and, 
second,  that  a  common  principle  may  govern  experience-dependent  synaptic  plasticity,  both  in 
CAl  and  throughout  the  superficial  layers  of  the  neocortex.  We  believe  that  this  work  represents 
an  important  advance  towards  a  general  theory  of  experience-dependent  synaptic  plasticity  in  the 
mammalian  brain. 

It  is  our  opinion  that  in  its  entirety  this  work  gives  strong  justification  for  a  form  of  modification 
similar  to  that  assumed  by  BCM.  However,  still  open  is  the  question  of  the  sliding  modification 
threshold.  Although  more  work  remains  to  be  done  on  this  question,  we  note  that  two  recent 
studies  have  shown  that  the  sign  and  magnitude  of  a  synaptic  modification  in  both  hippocampus 
(Huang  et  al.,  1992)  and  the  Mauthner  cell  of  goldfish  (Yang  and  Faber,  1991)  have  been  shown  to 
depend  on  the  recent  history  of  synaptic  activation. 

4  Reformulation  and  Extensions  of  the  BCM  Theory 

In  order  to  compare  the  BCM  theory  with  other  theories  of  synaptic  plasticity  as  weU  as  to  exhibit 
its  information  processing  and  statistical  properties,  the  following  formulation  proves  convenient. 

4.1  Objective  function  formulation 

In  a  recent  statistical  formulation  of  the  BCM  theory  (Intrator  and  Cooper,  1992),  the  threshold 
0m  was  defined^  as 

0m  =  E[{x  ■  m)^], 

and  an  energy  function  that  corresponds  to  a  risk  function  in  statistical  decision  theory  was  pie- 
sented: 

=  -(‘{icKx  ■  m)=|  -  ■  mf]}.  (4.1) 

It  was  shown  that  the  differential  equations  describing  synaptic  weight  modification  are  a  stochastic 
approximation  of  the  negative  gradient  of  the  risk,  hence  tending  to  minimize  this  risk  (Intrator 
and  Cooper,  1992,  for  review).  This  formulation  permits  us  to  demonstrate  the  connection  between 
the  unsupervised  BCM  learning  procedure  and  various  statistical  methods,  in  particular,  that  of 
Exploratory  Projection  Pursuit  (Friedman,  1987).  It  also  provides  a  general  method  for  stability 
analysis  of  the  fixed  points  of  the  theory  and  enables  us  to  analyze  the  behavior  and  the  evolution 
of  the  network  under  various  visual  rearing  conditions.  In  the  next  few  sections  we  shall  use  this 

use  the  same  notation  as  in  Intrator  and  Cooper  (1992),  we  denote  the  input  as  the  vector  t.  This  is  equivalent 
to  d  used  above. 
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formulation  to  extend  the  theory  to  nonlinear  neurons,  and  consequently  to  a  network  of  feedforward 
inhibitory  neurons. 

4.2  Nonlinear  Neurons 

Prom  statistical  considerations  that  are  motivated  by  the  projection  pursuit  ideas,  it  is  more  effective 
to  consider  a  nonlinear  neuron  that  is  less  sensitive  to  possible  outliers  in  the  data.  This  is  done 
by  defining  the  neuron’s  activity  as  c  =  <7(1  ■  m),  where  a  usually  represents  a  smooth  sigmoidal 
function.  It  is  also  desirable  to  have  the  ability  to  shift  the  projected  distribution  (of  the  input 
data)  so  that  one  of  its  peaks  is  at  zero,  by  introducing  a  threshold  /?  so  that  the  projection  is 
defined  to  be  c  =  tT(i  •  m  +  /3).  From  the  biological  viewpoint,  (3  can  be  considered  as  spontaneous 
activity.  The  modification  equations  for  finding  the  optimal  threshold  /3  are  easily  obtained  by 
observing  that  this  threshold  effectively  adds  one  dimension  to  the  input  vector  and  the  vector  of 
synaptic  weights  so  that  i  =  (ij . . .  ,i„,  1),  m  =  (wii, . . .  ,Tn„,^),  and  therefore,  /3  can  be  found 
by  using  the  same  synaptic  modification  equations.  For  the  rest  of  the  paper  we  shall  assume  that 
this  threshold  is  added  to  the  projection,  without  specifically  writing  it. 

For  the  nonlinear  neuron,  0^  is  defined  to  be  0^  =  E[ct^[x  ■  m)].  The  gradient  of  the  risk 
becomes: 

(4.2) 

where  cr'  represents  the  derivative  of  a  at  the  point  (i  •  m).  Note  that  the  multiplication  by  a' 
reduces  sensitivity  to  outliers  of  the  differential  equation  since  for  outliers  a'  is  close  to  zero.  The 
gradient  decent  procedure  is  valid,  provided  that  the  risk  is  bounded  from  below  (cf.  Intrator  and 
Cooper,  1992). 

4.3  Networks  with  Feed-Forward  Inhibition:  Application  to  Classification 

Intrator  and  Cooper  (1992)  have  extended  the  single  cell  theory  to  a  feed  forward  inhibition  network 
which  does  not  require  the  mean  field  approximation;  nor  does  it  require  that  the  cortico-cortical 
synapses  be  constant.  Thus  it  is  possible  to  study  networks  with  varying  amounts  of  excitation 
and  inhibition. 

The  activity  of  neuron  k  in  the  network  is  =  i  •  mjt,  where  is  the  synaptic  weight  vector 
of  neuron  k.  The  inhibited  activity  and  threshold  of  the  fe’th  neuron  are  given  by 

Ck  =  Ck-rjYl  (4-3) 

The  relation  between  the  feed  forward  inhibition  network  and  the  mean  field  network  is  discussed 
in  Intrator  and  Cooper  (1992). 

For  the  feed-forward  network  the  risk  for  node  k  is  given  by: 

=  (4.4) 

and  the  total  risk  is  given  by 

N 

R^'ZRk.  (4.5) 

k=l 
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It  follows  that  the  gradient  of  R  becomes: 


dR 

dm-k 


dRk 

drrik 


dRj 

dTTXj 


fx  [E[<l>ick,  Qi)x\  - 


(4.6) 


The  equation  performs  a  constraint  minimization  in  which  the  derivative  with  regard  to  one 
neuron  can  become  orthogonal  (when  77  —»  1)  to  the  sum  over  the  derivatives  of  all  other  synaptic 
weights.  Nevertheless,  the  coupling  between  the  neurons  is  very  simple  to  calculate  and  does  not 
require  any  matrix  inversion.  Equation  4.6  therefore,  allows  a  simple  computational  algorithm  that 
performs  exploratory  projection  pursuit  of  several  projections  in  parallel. 

When  the  nonlinearity  of  the  neuron  is  included,  the  inhibited  activity  is  defined  (as  in  the 
single  neuron  case)  as  Ck  =  cr(ck  -  are  defined  as  before.  However,  in  this 

case 

^  =  -rfo'{ck)x,  =  a'{ck)x.  (4.7) 

Therefore  the  total  gradient  becomes: 

dm, 


This  biologically  motivated  system  of  equations  has  many  desirable  statistical  properties  and  has 
been  applied  to  various  non-trivial  feature  extraction  tasks  such  as  phoneme  recognition  (Intrator, 
1992)  and  3D  object  recognition  (Intrator  and  Gold,  1993). 


5  Comparison  of  BCM  with  Other  Visual  Cortical  Plasticity 
Theories 


In  order  to  compare  ideas  concerning  visual  cortical  plasticity,  it  is  important  to  analyze  separately 
the  different  components  that  make  up  a  theory,  and  to  compare  theories  feature  by  feature.  We 
consider  a  theory  as  being  composed  of  the  following  three  components: 

•  Synaptic  modification  equations. 

•  Model  of  the  input  environment. 

•  Network  architecture. 

In  some  cases,  there  are  interactions  between  these  components  that  are  not  explicitly  defined. 
For  example,  severed  theories  are  sud  to  have  the  property  of  being  able  to  develop  orientation 
selectivity  prenataly,  i.e.,  using  random  noise  as  an  input  environment  (Linsker,  1986;  Miller  et  al., 
1989).  However,  under  closer  examination,  it  turns  out  that  they  have  architectual  constraints  that 
actually  yield  very  different  input  environments.  Inputs  to  the  network  become  strongly  locally 
correlated  after  the  first  layer  due  only  to  network  architecture  (the  arborization  function).  The 
arborization  function  determines  the  density  of  synapses  as  a  function  of  planar  distance  from  their 
target  cell.  This  correlated  input  can  then  drive  the  higher  level  of  cells  to  develop  orientation 
selective  cells.  When  the  arborization  function  is  uniform,  all  the  weights  of  all  layers  will  become 
positively  saturated  thus  no  selectivity  will  develop. 
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5.1  Comparison  Based  on  Synaptic  Modification  Equations 

In  order  to  examine  the  effect  of  the  synaptic  modification  equations  in  isolation,  we  shall  fix  the 
network  inputs  to  be  the  same,  and  fix  the  architecture  as  well.  The  simplest  architecture  that 
would  already  yield  a  significant  difference  between  several  models  would  be  of  a  single  cortical 
neuron  receiving  input  from  a  single  source  (single  eye). 

In  the  correlation  of  activity  models  (Sejnowski,  1977;  Linsker,  1986;  Kammen  and  Yuille,  1988; 
Yuille  et  al.,  1989;  Miller  et  al.,  1989)  the  input  is  defined  in  terms  of  the  correlation  of  activity  in 
the  presynaptic  afferents,  whereas  in  the  BCM  model  the  input  is  defined  in  terms  of  the  presynaptic 
activity.  For  reasons  that  will  become  clear,  it  is  difficult  to  transform  the  correlation  of  activity 
models  to  a  presynaptic  activity  models;  however,  we  can  rewrite  the  BCM  model  as  a  correlation 
of  activity  model. 

To  simplify  notation  and  without  loss  of  generality  we  shall  assume  that  the  input  activity  in 
each  ganglion  cell  has  zero  mean.  First  we  show  how  the  transformation  from  input  activity  to 
correlation  activity  is  done  by  expanding  on  footnote  15  of  Miller  et  al.  (1989).  This  will  be  done 
in  the  simple  case  of  a  single  cortical  neuron  with  no  interaction  between  LGN  inputs  coming  from 
the  two  eyes.  Miller’s  rule  has  the  following  form: 

dS(ott  t) 

— — —  =  A>i(-a)(c(t)  -  ci]a<,(a,f)  -  ■yS{a,t)  -  e'A{-a),  (5.1) 


where  a  is  the  afferent  location,  A  is  the  arbor  function  representing  the  number  (or  in  the  limit, 
the  density  of)  afferents  coming  from  location  a,  a„{a,t)  is  the  afferent  activity  at  location  a, 
the  subscript  a  represents  an  addition  of  threshold  and  saturation  effects,  and  cj  is  a  constant. 
c(()  is  the  neuronal  activity  which  in  this  simple  case  of  no  lateral  interactions  is  given  by  c{t)  = 
^(/5)  0®<r(/5i  0  +  C2  where  C2  is  some  constant.  'yS{a,t)  +  e'i4(-a)  are  decay  functions  of  the 
synaptic  weights.  Substituting  c{t)  into  5.1,  denoting  C3  =  C2  -  ci  and  taking  the  average  over  the 
input  space  (at  a  given  afferent  location)  we  obtain 


dS{a,  t) 
di 


XA{-a){'^S{0,t)E{a^{P,t)aa{a,t))  +  C3E{aa{a,t))}  -  'yS{a,t)  -  £'A(-q).  (5.2) 
0 


Using  C7(a,/3)  to  rep'-'sent  the  correlation  of  activity  C{a,0)  E{a^{a,t)a„{0,t))  we  get 

=  A>l(-a){5]5(/3,t)C(a,/3)-Ci£;(ao.(a,t)]}-75(a,t)-c'/l(-Q),  (5.3) 

which  is  a  simple  case  of  equation  (1)  in  Miller  et  al.  (1989). 

Analogous  reform  elation  is  done  below  for  the  BCM  modification  equation 

^^^^^  =  A0(c(t),0„)a(Q,t),  (5.4) 

for  the  synaptic  weight  m  (in  Miller’s  notation  this  is  5):  for  simplicity  we  omit  decay  terms  and 
assume  a  uniform  arbor  function. 

Using  a  simple  form  of  the  modification  function,  <^(c,  ©„)  c(c  -  ©„)  (Bienenstock  et  al., 
1982)  substituting  c{t)  =  m{0)a{0 ,  t)  and  taking  the  average  over  the  input  space  at  a  given 
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afferent  location  we  obtain 

dm{a^  ^  A£{a(a)  ^m(/3)a(/3)m(7)a(7)}  -  £;{a(a)  ]^m(/?)a(^)}0m.  (5.5) 

Wherever  it  is  clear  we  omit  the  dependency  of  a{a)  on  t  (assuming  it  is  a  stationary  process).  Using 
the  same  correlation  function  as  defined  above,  and  defining  the  third  order  correlation  C(a,/?,7) 
of  the  input  activity  to  be  C(q,/3,7)  =  il[a(a)a(/3)a(7)],  yields 

=  A{^C(a,A7M/3)m(7)-0„X:m(/3)C(a,^)}.  (5.6) 

0.1  0 

Using  a  definition  for  the  threshold  0^  ^  a(Q;)7n(a)]^  (Intrator,  1990;  Intrator  and  Cooper, 

1992)  and  using  the  second  order  correlation  function  C,  0m  becomes  0m  =  S-y/  ^*(7, 6)m(7)m(^). 
Therefore,  eq.  (5.6)  becomes 

-  =  A{^C(a,/3,7)m(/3)m(7)-  ]^C(a,/3)C(7,<5)m(/3)m(7)m(<5)}.  (5.7) 

0,1  0iS 

A  few  important  observations  follow.  The  correlation  based  models  use  only  the  first  and  second 
order  statistical  information^  of  the  data,  whereas  BCM  utilizes  in  addition  the  third  order  statistics 
of  the  input  activity.  Therefore,  without  going  into  the  details  of  the  limiting  behavior,  this  already 
suggests  that  correlation  based  models  are  less  sensitive  to  the  input  environment.  Analysis  shows 
that  in  many  cases  the  correlation  of  activity  models  find  the  principal  components  of  the  input 
environment  (Oja,  1982;  Kohonen,  1984;  Yuille  et  al.,  1989;  Miller  et  al.,  1989;  Sanger,  1989; 
Granger  et  al.,  1989).  In  the  following  section  we  discuss  some  properties  of  principal  components 
in  information  processing. 

5. 1. 1  Comparison  Based  on  the  Information  Extraction  Properties 

It  now  becomes  relevant  to  ask  what  type  of  structure  can  be  extracted  from  first  and  second 
moments  only,  and  what  constitutes  an  interesting  structure.  The  first  question  is  quite  old  and  its 
answer  is  well  known;  First  and  second  moments  contain  information  about  the  principal  compo¬ 
nents  of  the  input  distribution,  which  are  those  directions  that  can  minimize  error  between  the 
original  data  and  the  reconstructed  data  based  only  on  the  first  few  leading  components.  Another 
way  to  view  principal  components  is  to  observe  that  they  maximize  the  variance  of  the  projected 
distribution,  namely  the  variance  of  the  new  random  variable  that  is  the  projection  of  the  inputs 
onto  the  principal  components. 

Principal  Components  and  Maximum  Information  Preservation 

Networks  that  extract  principal  components  from  data  are  numerous,  e.g.,  (Sejnowski,  1977;  Oja, 
1982;  Linsker,  1986;  Kammen  and  Yuille,  1988;  Yuille  et  aJ.,  1989;  Miller  et  al.,  1989;  Sanger, 
1989;  Granger  et  al.,  1989).  Linsker  presented  the  principles  guiding  synaptic  modification  in  his 
layered  network  and  showed  that  the  development  rule  causes  a  cell  to  develop  so  as  to  maximize 

’in  other  words,  the  mean  and  covariance  matrix  of  input  activity 
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the  variance  of  its  output  activity,  subject  to  the  constraint  on  the  total  connection  strength, 
and  on  each  synaptic  value  (Linsker,  1988).  Linsker  then  describes  the  connection  of  this  rule  to 
principal  component  analysis  and  to  the  principle  of  maximum  information  preservation  taken  from 
information  theory^.  This  principle  is  optimal  when  the  goal  is  to  accurately  reconstruct  the  input, 
but  is  not  optimal  when  the  goal  is  classification.  This  is  shown  in  the  following  simple  example 
(Figure  1,  see  also  p.  212  Duda  and  Hart,  1973).  Two  clusters  each  belonging  to  a  different  class 
are  presented.  The  goal  is  to  find  a  single  dimensional  projection  that  will  capture  the  structure 
information  in  the  data.  In  Figure  1  (B)  different  clusters  have  different  variance  in  either  direction, 
whereas  in  Figure  1  (A)  the  variance  in  both  directions  is  equal.  Clearly,  the  structure  in  the  data 
is  conveyed  in  the  x  projection  however,  in  the  first  example  the  variance  is  maximized  in  the  y 
projection.  This  projection  also  minimizes  the  mean  squared  error  (MSE)  and  is  therefore  superior 
from  maximum  information  preservation.  In  the  second  example,  due  to  the  fact  that  the  variance 
of  each  cluster  is  equal  in  both  directions,  the  optimal  projection  that  capture  the  most  structure 
in  the  data,  and  preserves  maximum  information  is  the  x  projection. 

Another  way  to  view  what  principal  components  do  to  the  data  is  by  observing  that  they  define 
a  new  system  of  coordinates  in  which  the  covariance  matrix  is  diagonal,  namely,  they  eliminate  the 
second  order  correlation  in  the  data  i.e.,  correlation  between  the  projections  of  the  input  data  onto 
any  two  principal  components.  It  is  important  to  note  here  that  this  procedure  does  not  eliminate 
higher  order  correlation  in  the  data. 

Finding  Other  Interesting  Low-Dimensional  Structure  in  Data 

This  problem  has  recently  been  discussed  in  the  context  of  a  statistical  method  called  projection 
pursuit  (PP)  (Huber,  1985,  for  review).  This  method  seeks  structure  that  is  exhibited  by  (linear) 
projections  of  the  data,  and  is  therefore  relevant  to  neural  network  theory  since  the  activity  of 
a  neuron  is  believed  to  be  a  function  of  the  projection  of  the  inputs  on  the  vector  of  synaptic 
weights.  Diaconis  and  Freedman  (1984),  have  shown  that  for  most  high-dimensional  clouds,  most 
low-dimensional  projections  are  approximately  normal.  This  finding  suggests  that  important  infor¬ 
mation  in  the  data  is  conveyed  in  those  directions  whose  single  dimensional  projected  distribution 
is  far  from  Gaussian.  For  example,  some  known  measures  of  deviation  from  normality  are  skewness 
and  kurtosis  which  are  functions  of  the  first  four  moments  of  the  distribution.  These  moments  con¬ 
tain  information  about  statistical  correlations  up  to  fourth  order.  Intrator  (1990)  has  shown  that 
a  BCM  neuron  (given  by  equation  4.8)  can  find  structure  in  the  input  distribution  that  exhibits 
deviation  from  normality  in  the  form  of  multi-modality  in  the  projected  distributions.  This  type  of 
deviation,  which  is  measured  by  the  first  three  moments  of  the  distribution,  is  particularly  useful 
for  finding  clusters  in  high  dimensional  data  (since  clusters  can  not  be  found  directly  in  the  data 
due  to  its  sparsity)  and  is  thus  useful  for  classification  or  recognition  tasks.  Below,  we  give  another 
interpretation  of  this  projection  index  in  light  of  the  previous  discussion. 

If  we  assume  that  the  retina  is  performing  decorrelation  of  the  inputs  (Atick  and  Redlich,  1992) 
then  the  covariance  matrix  C(a,/3)  is  diagonal  (assuming  that  the  inputs  have  zero  mean)  and  so 
for  eigen  values  e(Q),  equation  5.7  becomes: 

/3.7 

*For  review  in  this  context  see  (Linsker,  1988). 
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This  suggests  that  the  BCM  synaptic  modification  equation  is  performing  third  order  decorrelation 
of  the  inputs  subject  to  some  penalty  related  to  the  size  of  the  weights.  When  the  second  order 
statistics  of  the  input  data  is  not  decorrelated,  then  the  modification  equation  can  be  thought  of 
trying  to  find  some  bcdance  between  third  order  correlation  and  second  order  correlation  in  the 
data. 


5.2  Comparison  Based  on  Assumptions  About  the  Input  Environment 

We  summarize  in  Table  1  the  different  assumptions  about  the  environment  used  by  our  group 
and  by  Miller  and  colleagues  to  model  classical  visual  deprivation  experiments.  What  is  apparent 
from  this  comparison  is  the  different  emphasis  given  to  between-eye  and  within-eye  correlations  in 
activity.  To  quote  Mastronarde:  “Some  of  the  strongest  evidence  on  the  importance  of  correlated 
firing  in  development  comes  from  cases  where  local  correlations  in  activity  are  induced  by  sensory 
stimulation;  e.g.  formation  of  binocular  cells  in  visual  cortex  requires  binocularly  corresponding 
visual  input  to  the  two  eyes  (Hubei  and  Wiesel,  1965).  There  has  been  growing  interest  in  a  more 
restricted  question:  what  is  the  role  in  development  of  the  correlated  activity  that  occurs  in  the 
spontaneous  discharge?”  (Mastronarde,  1989).  In  our  work  we  have  chosen  to  focus  on  the  influence 
of  activity  induced  by  sensory  stimulation  on  the  development  of  visual  cortex. 

5.3  Comparison  Based  on  Network  Architecture 

Although  it  is  possible  that  network  architecture  plays  an  important  role  in  comparison  of  vari¬ 
ous  ideas  on  visual  cortical  plasticity,  in  this  chapter  we  have  not  done  any  analysis  of  different 
architecture.  This  will  be  dealt  with  in  subsequent  work. 

6  Concluding  Remarks 

We  have  given  a  short  account  of  the  BCM  theory  of  synaptic  plasticity  -  comparison  with  exper¬ 
iments  in  visual  cortex  and  possible  cellular  and  molecular  basis  for  the  fundamental  modification 
equations.  In  addition  we  have  shown  that  correlation  based  models  and  BCM  differ  in  the  type 
of  structure  for  which  they  search;  Correlation  models  include  first  and  second  order  statistics  of 
input  correlations,  while  BCM  modification  also  includes  third  order  statistics. 

Evidence  exists  for  a  principal  component  type  preprocessing  that  may  be  taking  place  in 
the  retina;  we  suggest  that  BCM  modification  further  preprocess  the  visual  inputs  by  reducing 
(extracting)  third  order  statistical  correlations.  Extracting  third  order  statistics  from  the  visual 
environment  is  a  natural  extension  and  complements  the  extraction  of  second  order  statistics  that 
may  be  done  in  the  retina. 

Statistical  theory  tells  us  that  finite  order  statistics  of  the  data  is  not  sufficient  to  uniquely 
characterize  the  data  distribution.  However,  the  addition  of  the  third  order  moment  adds  impor¬ 
tant  feature  such  as  skewness  to  the  description  of  the  distribution  (Kendall  and  Stuart,  1977,  for 
review),  and  in  this  case,  it  adds  information  about  multimodality  (Intrator,  1992).  The  method 
of  principal  components  is  sufficient  for  finding  clusters  in  data  when  the  variance  of  each  cluster  is 
relatively  constant  in  all  directions,  since  then  directions  that  maximize  the  variance  also  maximize 
the  information  conveyed  for  the  purpose  of  classification.  When  this  is  not  the  csise,  the  example 
(Figure  1)  shows  that  the  direction  that  maximizes  the  variance  of  the  projection  does  not  neces¬ 
sarily  carry  information  useful  for  separation  of  the  two  clusters  although  it  does  find  the  direction 
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that  will  minimize  the  mean  squared  error  between  the  reconstructed  signal  and  the  input;  this  is 
dictated  by  the  principle  of  maximum  information  preservation. 

It  is  possible  that  the  principle  of  maximum  information  preservation  is  useful  in  retinal  process¬ 
ing,  in  which  an  order  of  magnitude  reduction  in  the  number  of  cells  occurs.  We  suggest  that  this 
principle  is  not  general  enough  to  account  for  processing  done  in  early  visual  cortex,  and  that  such 
statistical  properties  provide  a  convenient  framework  for  comparison  of  various  plasticity  theories. 

References 

Artola,  A.,  Brocher,  S.,  and  Singer,  W.  (1990).  Different  voltage  dependent  thresholds  for  the  induction  of 
long-term  depression  and  long-term  potenation  in  slices  of  rat  visual  cortex.  Nature,  347:69-72. 

Atick,  J.  J.  and  Redlich,  N.  (1992).  What  does  the  retina  know  about  natural  scenes.  Neural  Computation, 
4:196-210. 

Bear,  M.  F.  and  Colman,  H.  (1990).  Binocular  competition  in  the  control  of  geniculate  cell  size  depends  upon 
visual  corical  NMDA  receptor  activation.  Proceedings  of  the  National  Academy  of  Sciences,  87:9246- 
9249. 

Bear,  M.  F.,  Gu,  Q.,  Kleinschmidt,  A.,  and  Singer,  W.  (1990).  Disruption  of  experience-dependent  synaptic 
modification  in  the  striate  cortex  by  infusion  of  an  NMDA  receptor  antagonist.  Journal  of  Neuroscience, 
10:909-925. 

Bear,  M.  F.,  Press,  W.  A.,  and  Connors,  B.  W.  (1992).  Long-term  potentiation  of  slices  of  kitten  visual 
cortex  and  the  effects  of  NMDA  receptor  blockade.  Journal  of  Neurophysiology,  67:841-851. 

Bienenstock,  E.  L.,  Cooper,  L.  N.,  and  Munro,  P.  W.  (1982).  Theory  for  the  development  of  neuron  selectivity: 
orientation  specificity  and  binocular  interaction  in  visual  cortex.  Journal  Neuroscience,  2:32-48. 

Connors,  B.  W.  and  Bear,  M.  F.  (1988).  Pharmacological  modulation  of  long  term  potentiation  in  slice  of 
visual  cortex.  In  Soc.  Neurosci.  Abstr.,  volume  14,  page  298.  8. 

Cooper,  L.  N.  and  Scofield,  C.  L.  (1988).  Mean-field  theory  of  a  neural  network.  Proceedings  of  the  National 
Academy  of  Science,  85:1973-1977. 

Diaconis,  P.  and  Freedman,  D.  (1984).  Asymptotics  of  graphical  projection  pursuit.  Annals  of  Statistics, 
12:793-815. 

Duda,  R.  O.  and  Hart,  P.  E.  (1973).  Pattern  Classification  and  Scene  Analysis.  John  Wiley,  New  York. 

Dudek,  S.  M.  and  Bear,  M.  F.  (1992).  Homosynaptic  long-term  depression  in  area  CAl  of  hippocampus  and 
the  effects  on  NMDA  receptor  blockade.  Proc.  Natl.  Acad.  Set.,  89:4363—4367. 

Fox,  K.,  Sato,  H.,  and  Daw,  N.  (1989).  The  location  and  function  of  NMDA  receptors  in  cat  and  kitten 
visual  cortex.  J.  Neurosci,  9:2443-2454. 

Friedman,  J.  H.  (1987).  Exploratory  projection  pursuit.  Journal  of  the  American  Statistical  Association, 
82:249-266. 

Granger,  R.,  Ambrose-Ingerson,  J.,  and  Lynch,  G.  (1989).  Derivation  of  encoding  characteristics  of  layer  II 
cerebral  cortex.  J.  Cognitive  Neuroscience,  1:61-87. 

Greuel,  J.  M.,  Luhmann,  H.  J.,  and  Singer,  W.  (1987).  Evidence  for  a  threshold  in  experience-dependent 
long-term  changes  of  kitten  visual  cortex.  Devel.  Brain  Research,  34:141-149. 

Huang,  Y.  Y.,  Colino,  A.,  Sclig,  D.  K.,  and  Malenka,  R.  C.  (1992).  The  influence  of  prior  synaptic  activity 
on  the  induction  of  long-term  potentiation.  Science,  255:730-733. 


15 


Hubei,  D.  H.  and  Wiesel,  T.  N.  (1959).  Integrative  action  in  the  cat’s  lateral  geniculate  body.  J.  Physiol, 
148:574-591. 

Hubei,  D.  H.  and  Wiesel,  T.  N.  (1965).  J.  Neurophysiol.,  28:1041-1059. 

Huber,  P.  J.  (1985).  Projection  pursuit,  (with  discussion).  The  Annals  of  Statistics,  13:435-475. 

Intrator,  N.  (1990).  A  neural  network  for  feature  extraction.  In  Touretzky,  D.  S.  and  Lippmann,  R.  P.,  editors. 
Advances  in  Neural  Information  Processing  Systems,  volume  2,  pages  719-726.  Morgan  Kaufmann,  San 
Mateo,  CA. 

Intrator,  N.  (1992).  Feature  extraction  using  an  unsupervised  neural  network.  Neural  Computation,  4:98-107. 

Intrator,  N.  and  Cooper,  L.  N.  (1992).  Objective  function  formulation  of  the  BCM  theory  of  visual  cortical 
plasticity:  Statistical  connections,  stability  conditions.  Neural  Networks,  5:3-17. 

Intrator,  N.  and  Gold,  J.  I.  (1993).  Three-dimensional  object  recognition  of  gray  level  images:  The  usefulness 
of  distinguishing  features.  Neural  Computation.  In  press. 

Jampolsky,  A.  (1978).  Unequal  visual  inputs  and  strabismus  management:  A  comparison  of  human  and 
animal  strabismus.  In  Symposium  on  Strabismus  (Transactions  of  New  Orleans  Academy  of  Ophthal¬ 
mology).,  page  358.  The  C.  V.  Mosby  Co. 

Kammen,  D.  and  Yuillc,  A.  (1988).  Spontaneous  symmetry-breaking  energy  functions  and  the  emergence  of 
orientation  selective  cortical  cells.  Biological  Cybernetics,  59:23-31. 

Kendall,  M.  and  Stuart,  A.  (1977).  The  Advanced  Theory  of  Statistics,  volume  1.  MacMillan  Publishing, 
New  York. 

Kirkwood,  A.,  Gold,  S-  M.  D.  J.  T.,  Aizenman,  C.,  and  Bear,  M.  F.  (1992).  Common  forms  of  synaptic 
plasticity  in  Hippocampus  and  Neocortex  in  vitro,  preprint. 

Kleinschmidt,  A.,  Bear,  M.  F.,  and  Singer,  W.  (1987).  Blockade  of  NMDA  receptors  disrupts  experience- 
dependent  plasticity  of  kitten  striate  cortex.  Science,  238:355-358. 

Kohonen,  T.  (1984).  Self- Organization  and  Associative  Memory.  Springer- Verlag,  Berlin. 

Linsker,  R.  (1986).  From  basic  network  principles  to  neural  architecture  (series).  Proceedings  of  the  National 
Academy  of  Science,  83:7508-7512,  8390-8394,  8779-8783. 

Linsker,  R.  (1988).  Self-organization  in  a  perceptual  network.  IEEE.  Computer,  88:105-117. 

Mastronarde,  D.  N.  (1989).  Correlated  firing  of  cat  retinal  ganglion  cells.  Trends  in  Neuroscience,  (12):75-80. 

Miller,  K.  D.,  Keller,  J.,  and  Stryker,  M.  P.  (1989).  Ocular  dominance  column  development:  Analysis  and 
simulation.  Science,  240:605-615. 

Mioche,  L.  and  Singer,  W.  (1989).  Chronic  recordings  from  single  sites  of  kitten  striate  cortex  during 
experience-dependent  modifications  of  receptive-field  properties.  J.  Neurophysiol,  62:85-197. 

Oja,  E.  (1982).  A  simplified  neuron  model  as  a  principal  component  analyzer.  Math.  Biology,  15:267-273. 

Press,  W.  A.  and  Bear,  M.  F.  (1990).  Effects  of  disinhibition  on  LTP  induction  in  slices  of  visual  cortex.  In 
Soc.  Neurosci.  Abstr.,  volume  16,  page  348.  9. 

Ramoa,  A.  S.,  Paradiso,  M.  A.,  and  Freeman,  R.  D.  (1988).  Blockade  of  intracortical  inhibition  in  kitten 
striate  cortex:  Effects  on  receptive  field  properties  and  associated  loss  of  ocular  dominance  plasticity. 
Experimental  Brain  Research,  73:285-296. 

Reiter,  H.  O.  and  Stryker,  M.  P.  (1988).  Neural  plasticity  without  action  potentials:  Less  active  inputs 
become  dominant  when  kitten  visual  cortical  cells  are  pharmacologically  inhibited.  Proceedings  of  the 
National  Academy  of  Science,  85:3623-3627. 


16 


Sanger,  T.  D.  (1989).  Optimal  unsupervised  learning  in  a  single-layer  linear  feedforward  neural  network. 
Neural  Networks,  2:459-473. 

ScoHeld,  C.  L.  and  Cooper,  L.  N.  (1985).  Development  and  properties  of  neural  networks.  Contemp.  Phys., 
26:125-145. 

Sejnowski,  T.  J.  (1977).  Storing  covariance  with  nonlinearly  interacting  neurons.  Journal  of  Mathematical 
Biology,  4:303-321. 

Singer,  W.  (1977).  Effects  of  monocular  deprivation  on  excitatory  and  inhibitory  pathways  in  cat  striate 
cortex.  Experimental  Brain  Research,  134:508-518. 

Yang,  X.  and  Faber,  D.  S.  (1991).  Initial  synaptic  efficacy  influences  induction  and  expression  of  long-term 
changes  in  transmission.  Proceedings  of  the  National  Academy  of  Science,  88(10):4299-4303. 

Yuille,  A.,  Kammen,  D.,  and  Cohen,  D.  (1989).  Quadrature  and  the  development  of  orientation  selective 
cortical  cells  by  hebb  rules.  Biological  Cybernetics,  61:183-194. 


17 


Table  Caption 

Table  1  Summary  of  the  different  assumptions  about  the  input  environment  between  Clothiaux  et  al. 
(1992)  and  Miller  et  al.  (1989).  (Normal  Fearing  (NR),  Monocular  Deprivation  (MD),  and  Strabismus 
(ST)). 


Figure  Captions 

Figure  1  Principal  components  find  useful  structure  in  data  (A)  and  fail  when  the  variance  of  each  cluster 
is  different  in  each  direction  (B). 

Figure  2  The  4>  function  for  two  different 

Figure  3  Comparison  of  experimental  observations  with  BCM  (p  function  for  synaptic  modification.  Data 
replotted  from  Dudek  and  Bear  (1992). 
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Clothiaux  et  al.  (1991) 

Miller  et  al.  (1989) 

1 

•  Patterned  input  (correlated  activity 
within  the  eye  with  addition  of  noise) 
from  both  eyes. 

•  Correlation  between  eyes. 

•  Locally  correlated  input  from  both  eyes. 

•  No  correlation  between  eyes. 

MD 

•  Patterned  input  from  the  open  eye. 

•  Uncorrelated  noise  from  the  deprived  eye. 

•  Same  correlation  structure  as  NR. 

•  Reduced  activity  to  the  deprived  eye. 

im 

•  Patterned  input  from  each  eye. 

•  No  correlation  between  eyes. 

•  Locally  correlated  input  from  both  eyes. 

•  Anti  correlations  between  eyes. 

Table  1: 
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